225 research outputs found
Semantic Clone Detection via Probabilistic Software Modeling
Semantic clone detection is the process of finding program elements with
similar or equal runtime behavior. For example, detecting the semantic equality
between the recursive and iterative implementation of the factorial
computation. Semantic clone detection is the de facto technical boundary of
clone detectors. This boundary was tested over the last years with interesting
new approaches. This work contributes a semantic clone detection approach that
detects clones with 0% syntactic similarity. We present Semantic Clone
Detection via Probabilistic Software Modeling (SCD-PSM) as a stable and precise
solution to semantic clone detection. PSM builds a probabilistic model of a
program that is capable of evaluating and generating runtime data. SCD-PSM
leverages this model and its model elements to finding behaviorally equal model
elements. This behavioral equality is then generalized to semantic equality of
the original program elements. It uses the likelihood between model elements as
a distance metric. Then, it employs the likelihood ratio significance test to
decide whether this distance is significant, given a pre-specified and
controllable false-positive rate. The output of SCD-PSM are pairs of program
elements (i.e., methods), their distance, and a decision whether they are
clones or not. SCD-PSM yields excellent results with a Matthews Correlation
Coefficient greater 0.9. These results are obtained on classical semantic clone
detection problems such as detecting recursive and iterative versions of an
algorithm, but also on complex problems used in coding competitions.Comment: 12 pages, 2 pages of references, 5 listings, 2 figures, 4 table
- …